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We consider an asexual population evolving on rugged fitness landscapes which are defined on 
the multi-dimensional genotypic space and have many local optima. We track the most populated 
genotype as it changes when the population jumps from a fitness peak to a better one during the 
process of adaptation. This is done using the dynamics of the shell model which is a simplified 
version of the quasispecies model for infinite populations and standard Wright-Fisher dynamics 
for large finite populations. We show that the population fraction of a genotype obtained within 
the quasispecies model and the shell model match for fit genotypes and at short times, but the 
dynamics of the two models are identical for questions related to the most populated genotype. 
We calculate exactly several properties of the jumps in infinite populations some of which were 
obtained numerically in previous works. We also present our preliminary simulation results for 
finite populations. In particular, we measure the jump distribution in time and find that it decays 
as t~ 2 as in the quasispecies problem. 
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I. INTRODUCTION 



The question of mode in evolution, especially in the context of speciation has engaged the attention of many 
evolutionists for over a century and continues to do so The issue is whether evolution occurs by smooth gradual 
changes (gradualism) as put forwarded by Darwin, or sudden large jumps (punctuationism) Some examinations 

of fossil record indicate that new species can arise by either mode, or even by a combination of the two Q. However, 
a complete and unambiguous answer is hard to obtain at the level of macroevolution due to the incompleteness and 
irreproducibility of the fossil data, fn the last decade or so, researchers have become interested in carrying out long- 
term evolution experiments in the laboratory 5\. Typically one starts with a microbial population maladapted to 
, a given environment such as a colony of starving bacteria, and track its evolutionary trajectories for thousands of 

■ generations as it undergoes adaptive changes. The results of such experiments have been found to be consistent with 
| both modes of evolution. For instance, large populations of RNA virus starting from a low fitness ancestor have been 

seen to gain fitness in a continuous manner Q . On the other hand, the fitness of bacteria E. Coli 3, Q and RNA 
>J ' virus 06 @ show a punctuated pattern of evolution. 

Theoretically, these results are understood using the concept of fitness landscape [T3, EH E2 defined as a map from 

■ the genotypic space into the real numbers. If the fitness landscape is smooth and single-peaked, starting from a low 
fitness state the population fitness increases gradually until it has reached the peak value [HI]. The dynamics are 
different for a population moving on a rugged fitness landscape with multiple peaks 14 1 . In this case, the population 
fitness increases smoothly until the population encounters a local fitness peak where it gets trapped as a better peak is 

^ • separated by a fitness valley. The population thus enters the stasis phase and waits until a favorable mutation allows 
it to shift to a better peak where it can again get localised and so on. Thus, the dynamics alternate between periods 
of stasis and rapid changes in fitness when the population jumps from a peak to an even better one. An example of 
' this behavior is shown in Fig. [TJ 

That the fitness landscape underlying the evolutionary process is multi-peaked is supported by several experiments 
@j H, EE Ell E3, El| ■ Besides, these landscapes include biologically important and ubiquitous epistatic interactions [l9| 
as the contribution of individual genes to the fitness of the genotype is not independent [l4[ • The class of landscapes 
considered in this work are maximally rugged and characterised by strong selection [20(. In statistical physics, such 
rugged landscapes with a large number of local optima have appeared in the context of spin glass theory. Examples 
include random energy model [2l| and Sherrington-Kirkpatrick model [22[ where the energy of a spin configuration 
plays the role of genotypic fitness and the metastability in the spin glass dynamics can be viewed as punctuated 
equilibrium |23| . 

We are interested in the statistics of jumps which occur when the population fitness changes rapidly. As illustrated 
in Fig.[T] an unambiguous way of defining a jump at time t is the change in the fitness or identity of the most populated 
genotype at t. The problem of such leadership changes arises in several other contexts such as change in the position 
of the optimal end point of directed polymer 24] , highest degree node in a growing network [25l |26| , and velocity and 
position of the front particle in a single-file system [27], [28J . In the following sections, before analysing the realistic case 
of finite populations, we will study the infinite population limit in detail. We consider the dynamics of Eigen's model 
[29l ] that describes the population dynamics of self-replicating molecules which at low mutation rates and large times 
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FIG. 1: (Color online) Punctuated change in the population fitness (x) and the fitness of the most populated genotype (+) in 
a single realization of a maximally rugged fitness landscape. Here L = 6, N — 2 14 ,/i = 10~ 6 ,p(VK) = W~ 2 . 



form a dynamic and heterogeneous quasispecies consisting of fittest genotype and its closely related mutants. The 
existence of such an error threshold is a generic result seen in both simple and complex htness landscapes, and has 
been reviewed in [3(| (for recent related works, see [3l|, HH, [HI ) • The focus of this article is however the dynamics 
of the quasispecies model. We show that the population fraction at a genotype obtained within a simplified shell 
model [35[ (also see (36[) of quasispecies dynamics is good only for highly fit sequences and at short times. However, 
the behavior of the most populated genotype is captured correctly by the shell model. We calculate exactly some 
properties of the jumps (as defined above) within the shell model which were obtained numerically in previous works 
[35L [37L l38l|. Specifically, we show that the jump distribution decays as 1/i 2 in time and l/Vk as a function of the 
distance k from the starting genotype for a class of fitness landscapes. 

The basic difference between a finite and infinite population is that while the former has a finite mutational spread 
in the genotypic space, all the mutants are available at all times in the quasispecies limit. Due to this reason, at 
large times, a finite population gets trapped at a local fitness peak and the jump probability is governed by the rate 
of stochastic tunneling [3^, H(| which allows the population to cross the fitness valley via few low fitness mutants. 
In the quasispecies model on the other hand, the reproduction or selection plays the more important role than the 
production of mutants and a jump occurs when the population at a fit genotype overtakes the less fitter one. Although 
the underlying physical processes responsible for a jump are different for finite and infinite populations, we find that 
the jump distribution is robust in that it decays as 1/i 2 in both the cases. 

The plan of the paper is as follows. In the next section, we define a class of mutation-selection models. Section ITTT1 
discusses the dynamics of quasispecies model and shell model. We calculate exactly the statistics of jumps within the 
shell model in Section UVl These results for infinite population limit are compared with those obtained in simulations 
of finite population in Section [Vl Finally, we conclude with a summary of our results. 




II. MODELS 



We consider a haploid, asexual population of size N each of whose constituents carry a string a = {a\, ...,ol} of 
length L where a\ can assume I > 2 values. The string a can represent a genetic sequence (£ = 4), protein (£ = 20) 
or a sequence of L loci with I alleles [30|. For simplicity, we will deal with binary sequences for which i = 2 and 
<7j — or 1. The environment of the population is represented by a fitness landscape which is obtained by associating 
a non-negative real number W{a) to each sequence. In this article, we consider fitness W(a) to be a random variable 
chosen independently from a common distribution p(W). This generates a maximally rugged fitness landscape with 
an exponentially large number (in L) of local maxima [HI, E| and strong interactions amongst the loci [20]. The 
population evolves on this fitness landscape in discrete time via selection and mutation, and we ignore other factors 
responsible for genetic mixing such as recombination. Our model is thus applicable to microorganisms like E. Coli 
and Hepatitis C virus which have zero or very low recombination rate [42| . 

Consider a population well adapted to a given environment localised around a peak of the fitness landscape. A 
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change in the environment brings about a change in the fitness landscape, and the population will typically find itself 
in a fitness valley. We consider the adaptation process of the population starting from such an initial condition. The 
population fraction X(a,t) of sequence a at time t is iterated using Wright-Fisher dynamics defined as follows. In 
each generation, an offspring selects a parent p with probability W p (a)/N(W) where W p (a) is the fitness of the parent 
p with sequence a and (W) = Y^ a ' W(a')X(a' , t) is the average fitness of the population. Then the probability P(n) 
that parent p has n offspring in one generation is given as 



P{n) 



V\ (W p (a)\ n f W p (a)\ N - r 



\N(W)J \ N(W)J 



This implies that the average number of offspring produced equals W(a)/(W) and hence the fitness W(a) has the 
physical meaning that it is proportional to the number of descendants produced per generation. Further, as the 
relative variance in the offspring number decays as 1/iV, it follows that the offspring number fluctuates from one 
generation to another for finite population but one can ignore these fluctuations arising due to random sampling for 
N — ► oo. After the reproduction process, point mutations are introduced independently at each locus of the sequence 
a' with probability \i per generation. Thus, a sequence a is obtained via mutations with the probability 

p CT ^=/i^')(l-/i) L - d(CT,</) (1) 

where the Hamming distance d(a,a') is the number of point mutations in which the sequences a and a' differ. 

Since the selection process is not stochastic for infinite populations, the population frequency also does not fluctuate 
and one can work with the average frequency X(a,t) — (X(a,t)) where the averaging is over all realizations of the 
sampling process. This leads to a deterministic nonlinear equation for the fraction X(a,t) (20j . 

where the denominator is the normalization constant. 



III. DYNAMICS OF QUASISPECIES AND SHELL MODEL 

The focus of this section is the quasispecies model and the related shell model. In the following, we will mainly 
work with the unnormalised population Z(a,t) defined as [30] 

t-i 

Z(a,t)=X(a,t)'[[Y f W(a , )X(a , ,t) (3) 

t=0 a 1 

in terms of which the nonlinear evolution ([2]) reduces to the following linear iteration 

Z(a,t + l) = J2p^a'W(a')Z(a',t) . (4) 

(7 

Since at the start of the adaptation process the population finds itself at a low fitness genotype, we start with the 
initial condition X(a, 0) = Z(a, 0) = 5 a a .(o) where a^ ' is a randomly chosen sequence. For small mutation probability 
/j (~ 10~ 3 — 10~ 10 ) as seen in various asexual microbes [13, S3], after one iteration we have 

Z(a,l) = ^^W(a^) . (5) 

Thus each sequence gets populated in one generation with a fraction which is same for all the sequences in a shell 
of constant Hamming distance d{a, a^) from the initial sequence <r^ ^ (35[. Numerical simulations of [35| showed 
that dynamical properties involving the most populated genotype such as the distribution of evolution times and 
number of jumps are very well described by a simplified model which ignores mutations for further evolution and 
allows the population at each sequence to grow with its own fitness. Thus within the shell model, the population 

Z(a,t) - (i^^W^a) for t > 1. 

Here we provide an analytical understanding of the quasispecies model leading to the shell model. We will find 
that the expression for Z(a, t) in shell model is a good approximation for sequences with high fitness and at short 
times. For t > 1, consider ((4| for a sequence a in the shell d centred about and at a Hamming distance d(a, a^) 
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from the center. The sum on the right hand side of (|4|) has three kind of terms: (i) sequence a does not mutate (i.e. 
a 1 = a in the sum) so that its contribution oc /x d ( <T ' <T< ° ) ), (ii) a mutation occurs in a' 6 / where / is the set of er"s 

lying inside shell d that satisfy d(cr, a 1 ) + d(a' , a^) — d(cr, a^) resulting in ^ d (< T ' cr< ) dependence and (hi) a mutation 
occurs either in the sequences in the inner shells that do not belong to set / or, in the sequences in and outside shell 
d giving a term of order ^ d (< 7 >< y{0) )+ 1 and higher. In order to obtain Z(a,t) to order /j, d ( cr ^ i ), we can neglect the last 
contribution and iterate the population according to 

Z(a, t + 1) = W(a)Z{a, t) + W(a')Z(a', t) . (6) 

a' el 

The above equation is still coupled but one can make further simplifications by proceeding as follows. Let us first 
consider the zeroth shell i.e. sequence <r ( °> for which we immediately have 

Z(a(°\t) = W*(<r (0) ) . (7) 
For the sequences in the first shell for which d(a, a^ ) = 1, the sequence is the only member of set I and we have 

Z(a,t+l)=W(a)Z(a,t)+fiW t+1 {a (0) ) , d(a,a (0) ) = l (8) 
Defining Z(a,t) — W t (a)z(t), we obtain a simple difference equation for z(t) which can be iterated to give 

2 WV°)) 
z(t) = fj,r + fir , r — 



l-r ' W(o) ' 

If W(a) > W(aW), the population Z(a, t) grows exponentially fast with a rate equal to its own (log) fitness whereas 
in the opposite case, this growth rate is that of the initial sequence. For the sequences in the first shell, one therefore 
obtains 

^.*)=tei ) r" l(<r),r<i w 

v ' ; [fiW^a^) , r > 1 . w 

For the succeeding shells, one can work out the population Z(a,t) in a manner similar to above and find that 
Z(a,t) ~ ^ d ( <T ' CT< ° ) )W /t (cr) if the fitness W(a) is larger than that of all the sequences in set /. Such a random variable 
whose value is larger than all the ones preceding it defines a record [37l l38l]. If the fitness W(a) is not a record, then 
the growth rate is given by the largest (log) fitness of sequences in set I i.e. the last record value. 

A comparison of the results obtained using above approximations with the exact iteration of ^ is shown in Fig. [2] 
for the initial sequence and some representative sequences in the first shell. The disagreement at large times occurs 
because ([7]) and (J9j) are obtained to leading order in \i and after some time as the population at the outer shells grow, 
the next order terms in \i start contributing. For the zeroth shell, this happens at time r when the next order terms 
fiZ(a, t) due to sequences in shell one become as large as the lowest order term given by ([7]) i.e. W T (a^) ~ fj, 2 W T (a*) 
where W(<j*) > W(a^) is the largest fitness in the shell 1. Plugging in the fitness values of relevant sequences in 
this expression for the fitness landscape in Fig. [21 we obtain r « 8 in good agreement with the exact iteration. After 
time t, the population at grows with the fitness W(o~*) until the shell 2 starts contributing. This argument can 
be generalised to higher shells straightforwardly. For instance, the correction to © is of order ^ 3 which arises either 
due to the sequences in the first shell but two mutations away from cr or the sequences in the second shell that are 
one mutational distance away. The larger of the two contributions can be then used to estimate the time at which 
the growth rate changes. This process of slope changes of logarithmic population goes on until the globally fittest 
sequence becomes most populated after which the population Z(a 1 t) ~ fj, d ^ a ' ,T ^W t (a^). 

In Fig. [21 the initial sequence with rank 788 remains most populated until ~ 5 time steps after which the sequence 
ranked 23 overtakes it and becomes the next leader. We are interested in such leadership changes and will use the shell 
model for this purpose as it correctly captures the dynamical behavior of the most populated genotype. Recall that 
in the shell model, the population at each sequence grows with its own intrinsic fitness for all t > 1. This population 
dynamics are different from quasispecies in which the population of a sequence whose fitness is not a record grows 
with the fitness of the last record (in inner shells) at short times and the growth rate of the sequences change as the 
leading genotype changes in course of time. However, since the population at such sequences is always at least order 
/j, smaller than that at the leading genotype, the dynamics of the most populated sequence are not affected if the 
population at such sequences is also allowed to grow with their respective fitness. 
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FIG. 2: (Color online) Comparison of the logarithmic population fraction In Z(a, t) obtained by iterating exact equation Q 
(shown by points) and the shell model dynamics (0 and (shown by lines) for fi = 1CP 4 . The fitness W is chosen from a 
common exponential distribution for a sequence of length L = 10. The subscript in Z gives the fitness rank where the largest 
fitness carries rank and the population initially starts at a sequence with rank 788. All the other sequences are at Hamming 
distance 1 from cr' '. 




After these considerations, we arrive at the shell model in which the logarithmic population obeys the following 
linear time evolution 

\nZ{a,t) = lnW((rW) - | lnfi\d(a, a®) + In W{a) (t- 1). (10) 

Calling F(a) = In W(a) and rescaling time by | In //| , we have 

E(a,t) =-d{cr,<T (a) ) + F(a) t (11) 

where we have absorbed the extraneous factors in the definition of logarithmic population E(a, t). Since the population 
at t = 1 is same for sequences at constant d(a, a^) — k, only the sequence with the largest fitness in shell k need 
to be considered [35| . If the logarithmic fitness F is distributed according to the distribution p(F), then the largest 
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fitness F(k) in shell k which is the maximum of a k = (jrj) variables is distributed independently but non-identically 
with distribution 



Pk (F) = a kP (F) dF' p(F')j (12) 

where -F m i n is the lower support of the distribution p(F). 

IV. STATISTICS OF JUMPS IN SHELL MODEL 

In the following, we will consider the shell model dynamics defined as 

E(k,t) = -k + F(k) t (13) 

where F(k) and E(k,t) respectively are the fitness and population of the sequence with the largest fitness in shell 
k = 0, ...,L. Figure [3] shows the population E(k,t) as a function of time for k — 0, ...,4 as the global maximum of 
the fitness landscape in Fig. [5] lies at a Hamming distance 4 from the initial sequence. Since a line E(k,t) intersects 
E(k', t) at a time 

_ k - k 

F(k') - F(k) ' V ' 

and as F(l) > F(0), the population of the sequence in shell is overtaken by the one in shell 1. However to become 
a most populated genotype, it is not sufficient to have a record fitness. As Fig. [3] shows although F(2) > F(l), the 
population E(2,t) overtakes E(l,t) later than E(3,t) losing the evolutionary race. Thus, only those set of sequences 
become leader that manage to overtake the current leader in the least time amongst all contenders. Finally, the 
globally fittest sequence £/(4,t) catches up with E(3,t) and the population localises at the global peak (35l.[38j. 

The shell model defined by (fT3"|) describes a population growing linearly in time with a slope equal to the fitness 
F(k) chosen from the distribution p k {F). A simpler version of this model in which the fitness F(k) are assumed to 
be independent and identically distributed (i.i.d.) variables with distribution p(F) has also been considered l35l. l38l]. 
Several properties of this model have been recently calculated via a mappingto a first-passage problem [44j | and 
considering it as a system of hard-core particles undergoing elastic collisions [27| . In particular, it has been shown 
that the average number of jumps grows as /31nL where the prefactor f3 < 1 and depends on the choice of p(F). In 
both of these approaches, the initial condition (the intercept) of E(k, t) is not fixed and is a uniformly distributed 
random number on the real line. In this article, we present a way to calculate average number J of changes in k* 
which respects the discreteness of the underlying genotypic space. We will perform the calculations for the shell model 
for which the fitness is non-identically distributed, although our method is readily applicable to the i.i.d. model also. 
However, we mention that the prefactor (3 calculated using our method turns out to be the same as in the analysis of 

MM. 



A. General formulae 



To calculate the statistics of jumps, we need two basic distributions: (i) the probability P k (F,t) that the most 
populated sequence in shell k with fitness F is the leader at time t and (ii) the probability Wy , k (F,t)dt with which 
this sequence is overtaken by the most populated sequence in shell k'(> k) between time t and t + dt. The distribution 
Pk(F,t) requires that the population E(j,t) < E(k,t), j ^ k which implies that the fitness F(j) < F + (j — k)/t. 
Since the fitnesses of the most populated sequence in each shell are independent random variables, we have 

L fP+^r 1 

P k {F,t) = Pk (F) Yl dF'p 3 {F'). (15) 

To find W k ',h(F, t), we need to determine the fitness of the most populated sequence in shell k'{> k) which can 
contribute to the overtaking event. Let us denote the location of the leader at time t by E(t), 



E(t) = 



-k + F t 
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Then the sequence in shell k' can overtake the fcth one at t if the fitness F(k') = (E(t) + k')/t. Similarly, the sequence 
in the kth shell can be overtaken at t + dt, dt/t — ► if 

F(k') = E{t ^l +k ' = F + - + Oftf) • 

Thus the probability that a sequence in shell fc is overtaken by a sequence in shell kl between time t and t + dt is 
given by 

t)di = f Pk (F) « pfc, (V + ^— - ) dt,k'>k. (16) 

Finally using the distributions defined in (jT5j) and (|16[) . we can write the probability Vk' ,k(t) that the most populated 
sequence in shell k' overtakes the one in shell k at time t as 

Vk',k( t )= dF W k ,. k {F,t) P k (F,t) (17) 

where _F max is the upper support of the distribution p(F). 

Depending on the quantity of interest, one can either integrate over time or sum over a space variable in Vk',k(t). 
Often the experimental data such as a morphological feature [J] or average fitness [|[ are plotted as a function of time 
and can be used to find the number of jumps in time. Therefore it is useful to consider the distribution Sf{t) of a 
jump to occur at time t which can be found by summing Vk'.k(t) over k and k', 

L L 
fc=0 k'=k 

The relationship between the overtaken and the overtaking sequence can be deduced by computing the jump distri- 
bution J k that the most populated sequence in shell k is a jump given by 



k>=k Jo 



dt V k >,k(t) ■ ( 19 ) 



The average number J of jumps can be obtained by either integrating J{€) over time or summing J k over k. 

In the previous works on shell model [35l r3TL |38| . several properties of the jumps have been studied numerically 
when the fitness F is distributed according to an exponential or Gaussian distribution. In the following subsections, 
we will use the expressions derived above for the exponential case which lends itself to a detailed analysis and then 
give some results for fitness distributions decaying faster than an exponential. 



B. Exponentially distributed fitness 

We consider p(F) — e~ F for which the largest fitness in shell k is distributed according to 

p k (F) =a k e-^l-e^r*- 1 . (20) 
Using this in (TT5]) , we obtain the distribution that the leader with fitness F is in shell k, 

P k (F,t) = a & (-^)n(l-e-^r 

w at e e ' (21) 



where the last expression is obtained by exponentiating the product and keeping only the leading order terms in the 
expansion. Similarly, the overtaking rate (|16p can be written as 



t 2 
k'-k 



«* ( '^F ) e- F+k ~^ e-^- F+ ~ * . (22) 
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FIG. 4: (Color online)Jump statistics for p(F) — e~ F . Left: Distribution Pk(t) given by (I25|l for fixed k/L = 0.3 in support 
of (Hp for L = 500 (solid) and 1000 (dotted). The inset shows the 1/t 2 dependence of V k {t) for k = 450, L = 1000. Right: 
Distribution V k ',k given by <j32j for k = 50, 75 and 100 (left to right) and L = 1000. 



Then the probability that the population in the fc'th shell exceeds the population in the fcth shell at time t is given by 

Vw, k {t) = a k ay e^ j\z z e~' \^ (23) 

Neglecting the first term in the exponent of the exponential in the integrand and carrying out the integral for large 
L, we finally obtain 

(l + e-~) L (l + e-~) L V t J 
We will now use this expression to calculate jump statistics. 

Temporal behavior-. Let us first consider the distribution Vk{t) of a jump to occur in the fcth shell at time t. Summing 
Vk',k(t) over k', we obtain [45[ 

W) = °* £ (k< - k) a k ,e-± = C "!, 2Lr y + .^:^ 2^(2, k + 1 - L; k + 2; -e~i) (25) 

t 2 (l + e~*) 2L jr^ k i 2 (l + e~t) 2L T[L - k)V{k + 2) 

where T(n) is the gamma function and 2-F1 (a, b; c; z) is the hypergeometric function. We point out that the distribution 
Vk{t) gives the probability that E(k,t) is overtaken at t and hence differs from the distribution that E(k,t) is the 
largest at time t considered in [38]. The function Vk{t) is plotted as a function of time for various values of k in 
Fig. [4] To gain some insight into the behavior of this distribution, we calculate the above sum using saddle point 
approximation. Using the Stirling's formula for binomial coefficient 



L\ L L 1 



k) y 27rfc(L-fc) k k {L-k) L - k 
in l]25p for ay and approximating the sum over k' by an integral, we have 



(26) 



" dk' (k' - k) e^ « Q) A(fe)[erf(A(fc)) - erf(A(L))] + ^ [e"* 2 « - e" A 'W] 

where we have estimated the integral using the saddle point method. In the above expression, f(k') = (k' /t) — Inay , 
f"(k' Q ) is the second derivative of f(k') evaluated at the minimum k' of the function f(k') and the deviation A(fc') = 
(k' -K)Vf"(K)/2- Explicitly, 



K _ ^ , rW) . _L_ . (l±filil±£Zi) , AW _ (t _ Kh lr< ■ 
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Approximating the factor a^e - " in (|25[) also by a Gaussian in a manner similar to above, we finally have 



V k (t) = 



2 -A 2 (fe) 

2^t 2 



A(fc)[erf(A(fc)) - erf (A(£))] 



(27) 



Using this expression, it is easy to obtain the typical shell location of the sequence overtaken at time t by integrating 
kVk{t) over k. Since 'Pfc(i) is normalised to J{t), we find that given an overtaking event occurs at t, the average 
location of the overtaken sequence scales as k' with a standard deviation of order 1/ y/ f"(k' ) about it. This length 
scale can also be expressed in terms of time for fixed k as the distribution Vk{t) is maximised at A(fc) = or k' Q (t) = k. 
Thus for large L, the time T m p . at which 'Pfc(t) is most probable is given by 



J m . p . — 



In 



L-k 



(28) 



This means that a sequence with k ~ 0(L) is most likely to be overtaken in a time of order unity. This fact is also 
expressed in Fig. 2] which shows the distribution "Pfc(t) at fixed k/L as a function of time. The time scale T m p . can 
be understood b y a simple argument which estimates the intersection time given by (|10[) . This argument is analogous 
to that used in [20l l35l . [38| where the fitness difference in the denominator of (| 10[) is given by the typical value of 
the fitness gap which probes the rare events. As we are interested in the most likely events, the denominator is 
approximated by the difference in the average value of the largest fitness in shell k' and k. From (j2"D|) , we see that 
the average largest fitness goes as lnafc. Since typically the most populated sequence in shell k is overtaken by a 
sequence located 0(\fk) distance away [38| (also see below), the numerator of (fTOj) scales as \[k and the denominator 
ln(a' k /otk) on using the Stirling's formula turns out to be \/k\n((L — k)/k) thus leading to (T2"8")) . 

Although the most probable value of the overtaking time is of order one, the average overtaking time is infinite 
due to the fat tail of the distribution Vk(t). For t ^> 1, we can approximate A(fe) by (2k — V)/\f2L and using the 
asymptotic expansion of error function for large argument [4(| , 



erf (a;) = —= I dy e y = 1 — 



l-KX 



1 - 



1 

2^ 



(29) 



we obtain 



r k (t) 



2tt t 2 



" Le2 / 2 , e = 1 - ^ > . 



(30) 



Due to the t 2 behavior at large times shown in the inset of Fig. [3J the mean time diverges for any L and k < L/2 
(see below). The tail of this distribution is exponentially suppressed in L for finite e but goes as \fht It 2 for k close 
to L/2. The late time behavior above is also obtainable from (jTUJ) by a simple change of variables [H, [3^|. 
After performing the integral over k in (|27p . we obtain 



J{t) 



' — — r sech f — 
4tt t 2 \2t 



(31) 



Integrating J(t) over time from to infinity, we find that the average number of jumps grows as \J Lit/2. 
Spatial behavior-. One can also find the probability Sf{k) that the most populated sequence in the fcth shell is a 
jump. As we are not interested in temporal distribution, the integral over time in (|24p can be carried out to give the 
probability that the sequence in shell k is overtaken by that in shell k' , 



Vk',k = / dt Vk',k{t) = 



k' 



k 



iF^k + k 1 ',2L;k + k' + 1;-1) 



(32) 



This distribution is shown for some representative parameters in Fig. 2] Approximating the integrand Pk',k(t) in the 
above equation by a Gaussian centred about inverse time t^ 1 = — ln((fc' + k)/(2L — kl — k)) and carrying out the 
integral, we obtain 



L(k' - k) (L\ fL\ ( 2L 
(k + k')(2L - k' - k) \k J \k' J \k' + k 



1 + erf 



'(k + k')(2L- 



4L 



k-k'), (2L-k'-k 
m 



(33) 
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The argument of the error function changes sign when fc' + k = L. For k' + k > L, the argument is negative and of 
order \[L 1. Using (|29|) . we find that the last factor in |33|) is exponentially small in L for k' > L/2,k < k' . Thus 
the probability that the overtaking sequence k' lies beyond the shell L/2 is negligible. This is understandable as the 
globally fittest sequence is typically located in the shell k = L/2 (3a |. For k, k' < L/2, the error function in ([55)) can 
be approximated by unity, and the probability distribution Vw ,k can be further simplified to give 



1 L 


fk'-k\ 


1 irk(L - k) 


V 2fc ) 



L(k' -k) 2 
g 4k(L-k) 



k<k! < L/2 



(34) 



where we have used the Gaussian approximation for the binomial coefficients. This form of the distribution implies 
that the overtaking sequence k' is located within 0(sfk) distance of the overtaken sequence fc. Thus the typical 
spacing between successive jumps for large k is roughly constant and goes as y/~L as seen in the numerical simulations 
of [38| . The jump distribution Jk for a jump to occur in shell k is obtained by integrating over k' and we have 



Jk 



nk(L - k) 



(35) 



where 9h is the Heaviside step function. Thus the distribution Jk decays as k for k <C L in accordance with the 
numerical results of [37ll38|. Integrating the preceding equation over fc, we find that the average number of jumps are 
given as \J~Dk /2 in agreement with the previous calculation. 

C. Gumbel-distributed shell fitness 

We now consider p(F) = Ae~ F '" , 7 > 0,A = 7^(1/7) for which the distribution (F) of the largest of random 
variables for large L has the Gumbel distribution as the limiting form (47j . 



1 



Pk(F) = — e "Si 



where 



B k = 



In 



oik 



r(i/7) 



1/7 



, Cfe — — 

7 



In 



oik 



r(i/7) 



(l- 7 )/ 7 



(36) 



(37) 



We will show that the tail of the distribution Vk{t) decays as 1/t 2 and the average number of jumps scales as \[L for 
any 7 > 0. 

The large time behavior of Vk (t) can be found by taking t — * 00 limit in (p~5|) and (jT6j) except for the 1 /t 2 factor in 
rate k- We thus have 



. , A k'-k e c '« Ck 
Vk{t) w 



k'=k 



t 2 C'C k 



F — t L 

dF e _u *e ^e _ ^-j=° 



e 3 e 3 



(38) 



after using the approximations similar to those used in arriving at (|24|) . The sum under the integral sign can be 
computed by saddle point approximation so that the integral (up to scale factors) is writeable as 



dz 



1 



where we have neglected the logarithm of T(l/7) in Bk and Ck for large L. Since we are interested in large times 
when fc — > L/2, to leading order in e = 1 — (2k/L), we find lna/c/Lln2 « 1 thus simplifying the expression for Vk(t) 
to yield 



V k (t) = 



2?T7 t 2 v ; 



(39) 
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where the last expression is obtained by estimating the sum over k' using Gaussian approximation. For small e 
(~ 1/VX) and 7 = 1, this expression reduces to ([30| for exponential distribution. For 7 7^ 1, the L dependence in 
the preceding equation consists of two factors, the first one arising because the typical separation k' — k ~ 0(y/L) for 
large k (3f| [38| and the last factor is due to 1/Cfc, k = L/2 left after scaling the fitness F in (|38p by Ck'- 

In order to find the average number of jumps, we first need to calculate T m p . for arbitrary 7. As argued in the 
last subsection, T~ p is given as the derivative (with respect to k) of the average shell fitness. From the scaling form 
([55]) . we see that the average shell fitness is proportional to Bk- For k of order L, its derivative grows as Ck, k ~ L. 
Then integrating Vt(t) over k and t, we find that the average number of jumps scale as yL for all 7 > 0. This result 
is also consistent with the simulations for the Gaussian distribution in [33 . 



V. JUMP DISTRIBUTION FOR FINITE POPULATIONS 



We will now consider the dynamics of a population of N individuals evolving according to the Wright-Fisher 
dynamics described in Section [Til Unlike the infinite population, a population of size N initially localised at sequence 
er(°) can spread up to a finite distance. This is since the typical fraction of the population at a sequence a in one 
generation is given as ^ d (< 7 ' <T<0) ) but as this fraction is bounded below by 1/N, it follows that the mutational spread 
d e s = IniV/l ln/x| for a finite population. Thus while all the mutants are available in one generation for quasispecies 
(see (0), only a finite number is present at any time in real populations. If a fitter sequence is available within 
this effective distance, the population behaves like a quasispecies and evolves deterministically. This is possible for 
d c g > 1 and at short times 20]. However at long times, any finite population can get trapped at a local peak if a 
fitter sequence lies farther out than d e s- In such an event, the population escapes the local peak via the process of 
stochastic tunneling which takes a time given by [H, H(| 



AT = 



Npi 2 L 



Wioc.f - Wloc.i 

Wioc , f Wioc,i 



(40) 



for deg = 1 where Wio C ,[i,f] refers to the fitness of the initial and final local peak separated by two mutations. During 
this time, most of the population stays at the local peak with fitness Wi oc ,i but a few less-fit mutants are produced 
by single mutations. When some of these mutants further acquire a favorable mutation, then the whole population 
quickly jumps to the next local peak with higher fitness Wi oc ,f • The physical process involved when a jump occurs in 
a finite population is thus different from that in the quasispecies case. In the latter case, each local peak is already 
populated albeit with a small frequency and a jump occurs when the population at a fitter sequence overtakes the 
current leader. 

We are interested in the jump distribution of large finite populations with d e g close to one. At large times when 
the tunneling drives the dynamics, we expect the density J(t) of jumps at t to scale as 

Jit) „ _L „ NfL*!® (41) 
AT p w 2 {t) v ' 

where Wi(t) is the typical fitness of the local peak visited at t separated by a better peak with fitness difference Aw(t). 
We expect Aw to decrease and Wi to increase with time as higher peaks are explored. However in the absence of an 
argument for these time dependences, we present our preliminary numerical results here. In the shell model, one does 
not have to deal with the whole genotypic space consisting of 2 L sites and it suffices to work with the L shells thus 
reducing the computational effort enormously (3o| . However such a rotational symmetry is not present for the finite 
population problem so we are able to handle only small values of L. Our numerical results for large finite populations 
and small L are shown in Fig. [5] for exponentially distributed log fitness F or p(W) — W~~ 2 as in the last sections. 
The data in Fig. [5] are averaged over several histories as the evolutionary trajectories are not deterministic for finite 
populations [201 ] . For fixed /i and L, we find that at long times, 



N 
1 

which decays the same way as in the quasispecies model, 

2 



J(i) ~ 3 (42) 



J(t) ~ ( ^r- ) VI , t > 1 (43) 



where we have reinstated the [i dependence. 
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VI. CONCLUSIONS 



In this article, we discussed the evolution of asexual population on rugged fitness landscapes with many local 
optima separated by valleys. We focused on the statistical properties of the most populated genotype which changes 
as the population locates better peaks in the fitness landscape. These properties were calculated exactly within a 
shell model which was derived systematically from the Eigen's quasispecies model for infinite populations. We showed 
that the expression for the population frequency within shell model approximates the quasispecies solution well for 
highly fit sequences and at short times only. However, the two models are equivalent as regards the statistics of the 
most populated genotype. We computed the average number of jumps in the shell model and found that it grows as 
\TL, L being the sequence length, for fitness distributions decaying as exponential or faster. The jump distribution in 
time was shown to decay as t~ 2 . This dependence is also seen numerically for the finite population but a satisfactory 
explanation for this case is presently missing. A more detailed analysis of the finite population properties will be 
taken up in the future. 
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